RapidML API

Getting started with RapidML is easy. All RapidML functions return an object of the rml class.

The RapidML.rml Class

rml class attributes

model :

This is the machine learning model generated by RapidML. It has already been trained on the training data and the target that was provided by the user, either as a DataFrame or in the form of X,y arrays wherein X is training data and y is target variables. This attribute is never null.

m_tpot :

Note: This may be null depending on the type of the functions use. See function usage here. This is a TPOT object which may be a TPOTClassifier or a TPOTRegressor. RapidML uses this object to find the optimal machine learning model for the supplied data.

You can use the various functions and attributes of rml.m_tpot in order to evaluate the trained model. For example: rml.m_tpot.score(testing_features, testing_classes) will allow us to evaluate our model on training data by returning an accuracy score. See the TPOT documentation for all the available functions and attributes of rml.m_tpot

d :

Note: This may be null depending on the type of the functions use. See function usage here.

This is a defaultdict containing the labels and their corresponding transformed values, should we choose to labelencode the table. See sklearn.preprocessing.LabelEncoder for more details.

rml class functions

put(self, mdl, d=None) :

This is a method used by the RapidML functions for assignment of attributes of rml objects. Here mdl can either be the model supplied by the user or supplied by RapidML via TPOT.

If mdl is a TPOT object then the model attribute is mdl.fitted_pipeline_ (the best pipeline found with TPOT for the training data) and the m_tpot attribute is a TPOT object. However if mdl is a fitted (trained) machine learning model then the model attribute will be mdl and the m_tpot attribute will be null.

If we decide to labelencode the training data, then the d attribute will be the d supplied as the function argument. Otherwise, the d attribute will be null.

le(self, df) :

This function may be called by the user from an rml object, in order to perform label encoding on another dataset, using the same encoding table used on a previous similar dataset.

For example, if we wish to perform the same transformation of labels on two DataFrames with same types of columns but different rows, then we first labelencode the first table, and then use this function to labelencode the next table.

RapidML.rapid_classifier

The rapid_classifier performs label encoding on the input DataFrame df (which are the features), depending on the user’s input. It then uses a TPOT backend to perform an intelligent search to find and optimize the best classifier in accordance with the input data. Finally, it populates an rml object’s attributes and returns this object.

Parameters

df

Type: pandas.DataFrame

This is the input DataFrame provided by the users as the training features as well as the initial columns and the target as the last column on the DataFrame.

le

Type: str

The default value is 'Yes'. If le is 'Yes', then RapidML will labelencode the input DataFrame supplied as df, and store the LabelEncoder in a defaultdict. Or, if le is 'No' then LabelEncoding will not be done. For any other value of le, a value error will be raised.

model

Type: tpot.TPOTClassifier

The default value is tpot.TPOTClassifier(generations=5, population_size=50, verbosity=2). This is a TPOTClassifier object. You can pass a TPOTClassifier object with different parameter configurations as per your requirement. In general, increasing the generations and population_size increases the model’s accuracy. See TPOTClassifier for more details.

name

Type: str

Default value is "RapidML_Files". The value of the string is the name of the directory in which RapidML creates for storing the machine learning models, LabelEncoder dictionary, skeletal input DataFrame, a datatype list and a dummy user input as serialized dill files, as well as the API.py and ``helper.py scripts. This directory is to be uploaded to a web-server, in order to serve (use) the model generated by RapidML for making predictions via the web API.

Returns

Returns a rml object. If le is 'Yes' then rml.d is populated, otherwise, it is null. rml.model and rml.m_tpot are always populated, when using rapid_classifier.

Files Created

model

This is the Machine Learning model generated by RapidML which is saved after being serialized via dill.

d

This is the DefaultDict (like dict) containing the LabelEncoder used to encode the labels in the DataFrame. It has been saved after serialization via dill.

df

This is the skeletal DataFrame, which contains only headers and no data. It has been saved after serialization via dill.

dt

This is a list containing the data types of the columns in the input DataFrame. It has been saved after serialization via dill.

f

This is a string containing a dummy input value and can be fed to the API as an URL argument. It is the second row of the input DataFrame, converted to a string. It has been saved after serialization via dill.

API.py

This is the actual Flask-API used by the server for accepting user inputs, making predictions on the basis of the inputs and returning the predictions.

helper.py

This is a helper module used by API.py and performs the actual predictions using the RapidML generated model.

RapidML.rapid_regressor

The rapid_regressor performs label encoding on the input DataFrame df (which are the features), depending on the user’s input. It then uses a TPOT backend to perform an intelligent search to find and optimize the best regressor in accordance with the input data. Finally, it populates an rml object’s attributes and returns this object.

Parameters

df

Type: pandas.DataFrame

This is the input DataFrame provided by the users as the training features as well as the initial columns and the target as the last column on the DataFrame.

le

Type: str

The default value is 'No'. If le is 'Yes', then RapidML will labelencode the input DataFrame supplied as df, and store the LabelEncoder in a defaultdict. Or, if le is 'No' then LabelEncoding will not be done. For any other value of le, a value error will be raised.

model

Type: tpot.TPOTRegressor

The default value is tpot.TPOTRegressor(generations=5, population_size=50, verbosity=2). This is a TPOTRegressor object. You can pass a TPOTRegressor object with different parameter configurations as per your requirement. In general, increasing the generations and population_size increases the model’s accuracy. See TPOTRegressor for more details.

name

Type: str

Default value is "RapidML_Files". The value of the string is the name of the directory in which RapidML creates for storing the machine learning models, LabelEncoder dictionary, skeletal input DataFrame, a datatype list and a dummy user input as serialized dill files, as well as the API.py and ``helper.py scripts. This directory is to be uploaded to a web-server, in order to serve (use) the model generated by RapidML for making predictions via the web API.

Returns

Returns a rml object. If le is 'Yes' then rml.d is populated, otherwise, it is null. rml.model and rml.m_tpot are always populated, when using rapid_regressor.

Files Created

model

This is the Machine Learning model generated by RapidML which is saved after being serialized via Dill.

d

This is the DefaultDict (like dict) containing the LabelEncoder used to encode the labels in the DataFrame. It has been saved after serialization via Dill.

df

This is the skeletal DataFrame, which contains only headers and no data. It has been saved after serialization via Dill.

dt

This is a list containing the data types of the columns in the input DataFrame. It has been saved after serialization via Dill.

f

This is a string containing a dummy input value and can be fed to the API as an URL argument. It is the second row of the input DataFrame, converted to a string. It has been saved after serialization via Dill.

API.py

This is the actual Flask-API used by the server for accepting user inputs, making predictions on the basis of the inputs and returning the predictions.

helper.py

This is a helper module used by API.py and performs the actual predictions using the RapidML generated model.

RapidML.rapid_classifier_arr

The rapid_classifier_arr function is similar to the rapid_classifier, except rather receiving the features and targets as a single input DataFrame, the function receives the features as X (type numpy.array), and the target as Y (type numpy.array). Another important point of difference is that this function doesn’t perform label encoding.

Parameters

X

Type: numpy.array or array-like

This is the input data.

Y

Type: numpy.array or array-like

This is the target.

model

Type: tpot.TPOTClassifier

Default value is TPOTClassifier(generations=5, population_size=50, verbosity=2). This is a TPOTClassifier object. You can pass a TPOTClassifier object with different parameter configurations as per your requirement. In general, increasing the generations and population_size increases the model’s accuracy. See the TPOTClassifier for more details.

name

Type: str

Default value is "RapidML_Files". The value of the string is the name of the directory in which RapidML creates for storing the machine learning models, LabelEncoder dictionary, skeletal input DataFrame, a datatype list and a dummy user input as serialized Dill files, as well as the API.py and helper.py scripts. This directory is to be uploaded to a web-server, in order to serve (use) the model generated by RapidML for making predictions via the internet.

Returns

Returns a rml object. rml.d is always null. rml.model and rml.m_tpot are always populated.

Files Created

model

This is the Machine Learning model generated by RapidML which is saved after being serialized via Dill.

API.py

This is the actual Flask API used by the server for accepting user inputs, making predictions on the basis of the inputs and returning the predictions.

RapidML.rapid_regressor_arr

The rapid_regressor_arr function is similar to the rapid_regressor, except rather receiving the features and targets as a single input DataFrame, the function receives the features as X (type numpy.array), and the target as Y (type numpy.array). Another important point of difference is that this function doesn’t perform label encoding.

Parameters

X

Type: numpy.array or array-like

This is the input data.

Y

Type: numpy.array or array-like

This is the target.

model

Type: tpot.TPOTRegressor

Default value is TPOTRegressor(generations=5, population_size=50, verbosity=2). This is a TPOTRegressor object. You can pass a TPOTRegressor object with different parameter configurations as per your requirement. In general, increasing the generations and population_size increases the model’s accuracy. See the TPOTRegressor for more details.

name

Type: str

Default value is "RapidML_Files". The value of the string is the name of the directory in which RapidML creates for storing the machine learning models, LabelEncoder dictionary, skeletal input DataFrame, a datatype list and a dummy user input as serialized Dill files, as well as the API.py and helper.py scripts. This directory is to be uploaded to a web-server, in order to serve (use) the model generated by RapidML for making predictions via the internet.

Returns

Returns a rml object. rml.d is always null. rml.model and rml.m_tpot are always populated.

Files Created

model

This is the Machine Learning model generated by RapidML which is saved after being serialized via Dill.

API.py

This is the actual Flask API used by the server for accepting user inputs, making predictions on the basis of the inputs and returning the predictions.

RapidML.rapid_udm

This allows RapidML to be a versatile model in the hands of experienced Data Scientists and developers. It works similarly to the rapid_regressor or the rapid_classifier wherein a single DataFrame is passed which contains the input data as well as the target (which is the last column).

However, it allows the user to provide a sklearn model of their choice. Depending on the user’s choice, label encoding is done or ignored. The model that is supplied is then fitted (trained) on the input data and then stored, by populating the rml.model attribute.

Parameters

df

Type: pandas.DataFrame

This is the input DataFrame provided by the users as the training features as well as the initial columns and the target as the last column on the DataFrame.

model

Type: sklearn model

This may be any model which supports the syntax sklearn.model.fit(X,y) where X is input data and y is target.

le

Type: str

The default value is 'Yes'. If le is 'Yes', then RapidML will labelencode the input DataFrame supplied as df, and store the LabelEncoder in a defaultdict. Or, if le is 'No' then LabelEncoding will not be done. For any other value of le, a value error will be raised.

name

Type: str

Default value is "RapidML_Files". The value of the string is the name of the directory in which RapidML creates for storing the machine learning models, LabelEncoder dictionary, skeletal input DataFrame, a datatype list and a dummy user input as serialized dill files, as well as the API.py and ``helper.py scripts. This directory is to be uploaded to a web-server, in order to serve (use) the model generated by RapidML for making predictions via the web API.

Returns

Returns a rml object. If le is 'Yes' then rml.d is populated, otherwise, it is null. rml.model is always populated, while rml.m_tpot is always empty.

Files Created

model

This is the Machine Learning model generated by RapidML which is saved after being serialized via dill.

d

This is the DefaultDict (like dict) containing the LabelEncoder used to encode the labels in the DataFrame. It has been saved after serialization via dill.

df

This is the skeletal DataFrame, which contains only headers and no data. It has been saved after serialization via dill.

dt

This is a list containing the data types of the columns in the input DataFrame. It has been saved after serialization via dill.

f

This is a string containing a dummy input value and can be fed to the API as an URL argument. It is the second row of the input DataFrame, converted to a string. It has been saved after serialization via dill.

API.py

This is the actual Flask-API used by the server for accepting user inputs, making predictions on the basis of the inputs and returning the predictions.

helper.py

This is a helper module used by API.py and performs the actual predictions using the RapidML generated model.

RapidML.rapid_udm_arr

The rapid_udm _arr function is similar to the rapid_udm, except rather receiving the features and targets as a single input DataFrame, the function receives the features as X (type numpy.array), and the target as Y (type numpy.array). Another important point of difference is that this function doesn’t perform label encoding.

Parameters

X

Type: numpy.array or array-like

This is the input data.

Y

Type: numpy.array or array-like

This is the target.

model

Type: sklearn model

This may be any model which supports the syntax sklearn.model.fit(X,y) where X is input data and y is target.

name

Type: str

Default value is "RapidML_Files". The value of the string is the name of the directory in which RapidML creates for storing the machine learning models, LabelEncoder dictionary, skeletal input DataFrame, a datatype list and a dummy user input as serialized Dill files, as well as the API.py and helper.py scripts. This directory is to be uploaded to a web-server, in order to serve (use) the model generated by RapidML for making predictions via the internet.

Returns

Returns a rml object. rml.model is always populated. rml.d and rml.m_tpot are always null.

Files Created

model

This is the Machine Learning model generated by RapidML which is saved after being serialized via Dill.

API.py

This is the actual Flask API used by the server for accepting user inputs, making predictions on the basis of the inputs and returning the predictions.